natural_language_understandingfandomcom-20200214-history
Combining logic and distributed representation
TODO: a very interesting (and might have lasting impact), best long paper in NAACL 2016: Andreas et al. (2016)Andreas, J., Rohrbach, M., Darrell, T., & Klein, D. (n.d.). Learning to Compose Neural Networks for Question Answering. With the wave of deep learning, researchers paid more and more attention to distributed representation. Although successful in many tasks, it has always been know that this approach has serious drawbacks that are strength of logic such as compositionality. Therefore the interest in combining them has also raised significantly. We may frame this line of research in a larger topic combining symbolic and sub-symbolic approaches which was fashionable during 1980s-1990s (e.g. Hilton, 1986Hinton, G. E. (1986, August). Learning distributed representations of concepts. In Proceedings of the eighth annual conference of the cognitive science society (Vol. 1, p. 12).; Ultsch, 1994Ultsch, A. (1994). The integration of neural networks with symbolic knowledge processing. In New Approaches in Classification and Data Analysis (pp. 445-454). Springer Berlin Heidelberg., 1995Ultsch, A., & Korus, D. (1995, November). Integration of neural networks with knowledge-based systems. In Neural Networks, 1995. Proceedings., IEEE International Conference on (Vol. 4, pp. 1828-1833). IEEE.). However the aim of recent research has contracted and terminology has been much distilled. Different models have been proposed to solve different specific tasks such as knowledge base completion (Socher et al. 2013Socher, R., Chen, D., Manning, C. D., & Ng, A. (2013). Reasoning with neural tensor networks for knowledge base completion. In Advances in Neural Information Processing Systems (pp. 926-934).), small-scale reasoning (Rocktäschel 2014Rocktäschel, T., Bosnjak, M., Singh, S., & Riedel, S. Low-Dimensional Embeddings of Logic. ACL 2014 Workshop on Semantic Parsing.). TODO: http://arxiv.org/pdf/1505.06816.pdf, http://arxiv.org/pdf/1505.07931.pdf, http://wwwhomes.uni-bielefeld.de/mkracht/html/model-final.pdf Approaches Direct mapping Herbelot & Vecchi (2105)Aure ́lie Herbelot and Eva Maria Vecchi. 2015. Building a shared world: Mapping distributional to model-theoretic semantic spaces PDF: "We predict that there is a functional relationship between distributional information and vectorial concept representations in which dimen- sions are predicates and weights are generalised quantifiers." Proposition completion Hilton (1986) had his neural network learn two family trees and got interesting representations of family members as a by product. The trees were turned in to 104 propositions (person1, relation, person2) of which 100 were used for training. For each proposition, the neural network was given fillers of two first roles and asked to predict that of the third. As of 2014, the paper was cited more than 500 times. The approach seems restricted regarding application and scalability. Paccanaro & Hilton (2000)Paccanaro, A., and Hinton, G.E. Learning Distributed Representations by Mapping Concepts and Relations into a Linear Space. ICML-2000, Proceedings of the Seventeenth International Conference on Machine Learning, Langley P. (Ed.), 711-718, Stanford University, Morgan Kaufmann Publishers, San Francisco. proposed linear relational embedding which is somewhat simpler. Their later paperPaccanaro, A., & Hinton, G. E. (2000). Extracting distributed representations of concepts and relations from positive and negative propositions. In Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on (Vol. 2, pp. 259-264). IEEE. extended the model to handle special cases where there is no answer or there are multiple answers. Relation predicting Bowman (2014)Samuel R Bowman. 2014. Can recursive neural tensor networks learn logical reasoning? In ICLR’14. employed a neural network with one hidden layer and one softmax layer to predict the relation (one of entailment, reverse entailment, equivalent, alternation, negation, cover, and independent) between two phrases. Relation classification TODO: Socher et al. 2013 Probabilistic inference informed by distributional similarity Beltagy et al. (2013)Beltagy, I., Chau, C., Boleda, G., Garrette, D., & Erk, K. (2013). Montague Meets Markov : Deep Semantics with Probabilistic Logical Form. Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*Sem-2013), 11–21. performed textual entailment recognization and semantic textual similarity by casting them as probabilistic entailment in Markov logic. For example, the similarity between two sentences: S''1: A man is slicing a cucumber. ''S''2: A man is slicing a zucchini. is judged as judged as the average degree of mutual entailment ( S_1 \models S_2 and S_2 \models S_1 ). Strictly speaking, ''S''1 does not entail ''S''2 and vice versa. The authors fixed this by adding the rule cucumber(x)→zucchini(x) | wt(cuc., zuc.) which literally means "if something is a cucumber, it is also a zucchini" (with inference cost=wt(...)). wt(.) is a function of the cosine similarity between two words. TODO: Further development: Beltagy et al. (2014)Beltagy, I., Roller, S., Boleda, G., Erk, K., & Mooney, R. J. (2014). UTexas: Natural Language Semantics using Distributional Semantics and Probabilistic Logic. ''SemEval 2014, 796.. References Category:Logic Category:Distributed representation